Prediction of axillary lymph node metastasis using a magnetic resonance imaging radiomics model of invasive breast cancer primary tumor

Background This study investigated the clinical value of breast magnetic resonance imaging (MRI) radiomics for predicting axillary lymph node metastasis (ALNM) and to compare the discriminative abilities of different combinations of MRI sequences. Methods This study included 141 patients diagnosed with invasive breast cancer from two centers (center 1: n = 101, center 2: n = 40). Patients from center 1 were randomly divided into training set and test set 1. Patients from center 2 were assigned to the test set 2. All participants underwent preoperative MRI, and four distinct MRI sequences were obtained. The volume of interest (VOI) of the breast tumor was delineated on the dynamic contrast-enhanced (DCE) postcontrast phase 2 sequence, and the VOIs of other sequences were adjusted when required. Subsequently, radiomics features were extracted from the VOIs using an open-source package. Both single- and multisequence radiomics models were constructed using the logistic regression method in the training set. The area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and precision of the radiomics model for the test set 1 and test set 2 were calculated. Finally, the diagnostic performance of each model was compared with the diagnostic level of junior and senior radiologists. Results The single-sequence ALNM classifier derived from DCE postcontrast phase 1 had the best performance for both test set 1 (AUC = 0.891) and test set 2 (AUC = 0.619). The best-performing multisequence ALNM classifiers for both test set 1 (AUC = 0.910) and test set 2 (AUC = 0.717) were generated from DCE postcontrast phase 1, T2-weighted imaging, and diffusion-weighted imaging single-sequence ALNM classifiers. Both had a higher diagnostic level than the junior and senior radiologists. Conclusions The combination of DCE postcontrast phase 1, T2-weighted imaging, and diffusion-weighted imaging radiomics features had the best performance in predicting ALNM from breast cancer. Our study presents a well-performing and noninvasive tool for ALNM prediction in patients with breast cancer.


Introduction
Breast cancer is a malignant tumor that poses a threat to women's health and has become the most prevalent cancer worldwide [1].The axillary lymph node (ALN) drains approximately 70% of breast lymph, which is the most important lymphatic-transfer pathway in breast cancer.ALN status (with or without metastasis) is an important basis for accurately evaluating clinical stage, treatment strategy, and prognosis in patients with breast cancer [2].Clinicians commonly perform axillary lymph node dissection to identify ALN status.However, this invasive method is associated with a risk of related complications such as arm numbness and upper limb edema [3].Therefore, a risk-free method is required for evaluating ALN status, which can reduce unnecessary invasive surgeries and the risk of related complications.
Radiomics, a noninvasive technique, involves resecting a multitude of quantitative features from medical imaging procedures such as computed tomography (CT) scan and magnetic resonance imaging (MRI).These features are subsequently used in lesion diagnosis and the prediction of disease-free survival [4][5][6].Notably, radiomics has shown advancements in preoperatively detecting lymph node metastasis in patients with breast cancer.Dong et al. first used radiomics based on fat-suppressed T2-weighted imaging (T2WI) and diffusion-weighted imaging (DWI) MRI sequences to predict ALN status in patients with breast cancer [7].Their study revealed that the prediction model, based on radiomics features derived from these two sequences, had a high performance with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.805.To enhance the prediction of axillary lymph node metastasis (ALNM), Liu et al. integrated clinicopathological parameters with dynamic contrast-enhanced (DCE) MRI radiomics features [8].Results showed that the predictive performance of the strategy was comparable to that in the study of Dong et al.In a separate investigation, Chai et al. evaluated the efficacy of predicting ALNM using four different MRI sequences, emphasizing that the second postcontrast phase of DCE had the highest performance, with an AUC of 0.860 [9].The collective findings underscore the variability in informativeness among MRI sequences for evaluating ALN status.Importantly, incorporating additional MRI sequences into a radiomics model may introduce noise, redundancy, or collinearity among features, potentially compromising model performance, stability, and generalizability [10].Thus, identifying the optimal combination of MRI sequences is important.Furthermore, the existing prediction models have been commonly developed using data from single centers.Thus, there is a lack of external validation.
This study aimed to explore the optimal combination of multisequence MRI radiomics for developing a prediction model to differentiate the ALN status of patients with breast cancer.Further, the prediction model based on MRI data from an independent center was validated.

Patients and breast MRI acquisition
This study enrolled 101 patients with invasive breast cancer from Guangdong Province Hospital for Women and Children Healthcare (center 1) and 40 patients with invasive breast cancer from Yantai YuHuangDing Hospital (center 2) from March 2021 to January 2022.The inclusion criteria were as follows: (1) patients with histopathological diagnosis of breast cancer, (2) those with information on demographic and clinical characteristics, and (3) those who underwent dedicated breast MRI within 2 weeks before surgery.The exclusion criteria were as follows: (1) patients who received preoperative neoadjuvant chemotherapy or radiotherapy, (2) those with a prior treatment history before MRI, (3) those with other malignant tumors, and (4) those with poor image quality or incomplete sequence.
The patients from center 1 underwent imaging using a 3.0T MRI scanner (Ingenia, Philips Healthcare, Best, the Netherlands).Meanwhile, patients from center 2 underwent imaging using a 3.0T MRI scanner (Discovery, GE Healthcare, Milwaukee, Wisconsin, the USA).During the MRI examination, the patients were placed in the prone position.The breasts hang naturally, and they were anchored properly in an eight-channel breastdedicated coil.In the axial position, the bilateral breasts were in the center of the field of view, including the whole bilateral breasts and axillary region.In the sagittal position, the positioning line was parallel to the long axis of the breast.The imaging protocol comprised non-fatsuppressed T1-weighted imaging (T1WI), fat-suppressed T2WI, DWI, and DCE MRI using fat-suppressed T1WI.DCE had one precontrast phase (1 min before contrast injection) and four postcontrast phases (range: 1-4, corresponding to the first to fourth min after contrast injection).Table 1 shows the details of the MRI protocols.

Radiomics feature extraction
A radiologist with 5 years of experience manually delineated the tumor volume of interest (VOI) on DCE postcontrast phase 2 for each patient using the Medical Imaging Interaction Toolkit (MITK) software (v.2016.11.0; http://www.mitk.org/).Figure 1 shows the VOI of the primary tumor using the MITK.Then, the VOIs of other sequences were manually checked and adjusted if needed.
A series of harmonization techniques were applied to the MRI volumes before radiomics feature extraction [12].First, images from other sequences were resampled to the resolution, spacing, and position of DCE postcontrast phase 2 using linear interpolation.For each MRI volume, the mean value and the standard deviation of intensity were calculated to standardize the MRI images from all sequences.Next, each volume was normalized using the z-score method by subtracting the mean intensity and dividing by the standard deviation of intensity [13,14].PyRadiomics (v3.0.1; http://www.radiomics.io/pyradiomics.html),an open-source Python toolkit for extracting radiomics features from medical images, was used to calculate the radiomics features [15].In total, 851 radiomics features were calculated from tumor VOI in all sequences.These features included 14 shape features, 18 first-order features, 24 Gy-level cooccurrence matrix features, 16 Gy-level run length matrix features, 16 Gy-level size zone matrix (GLSZM) features, 14 Gy-level dependence matrix features, 5 neighboring gray tone difference matrix features, and 744 Wavelet features.The details of all features are provided online (https://pyradiomics. readthedocs.io/en/latest/features.html) [16].
In the DWI sequence, radiomics features were directly extracted from the diffusion-weighted images.Apparent

Feature selection and ALNM classifier modeling
To reduce irrelevant and redundant features, a threestage feature selection was performed for each MRI sequence individually.First, the radiomics features with a variance < 0.05 were deleted.Next, Pearson correlation matrixes were established using pair-wise feature correlations.The mean absolute correlation of each feature was calculated, and the one with the highest value was eliminated.Third, to lower the risk of model overfitting, the minimum redundancy maximum relevance (mRMR) feature selection strategy was used to maintain the number of features within 1/10 of the total number of dependent sets [17,18].
To assess the validity and generality of the ALNM classifier, 101 patients from center 1 were randomly divided into the training set (n = 76) and test set 1 (n = 25) at a ratio of 3:1.Meanwhile, 40 patients from center 2 were grouped in test set 2 (n = 40).
For single-sequence ALNM classifier modeling, the selected radiomics features were fed into the least absolute shrinkage and selection operator (LASSO) [19].LASSO is a generalized linear model that performs both feature selection and regularization to enhance classification.Moreover, it has a great performance in breast radiomics studies [20].For each sequence, 3-fold crossvalidation was used to achieve a robust single-sequence ALNM classifier.Multisequence ALNM classifiers were constructed by integrating single-sequence ALNM classifiers using the multivariate logistic regression model with combinations of T1WI, T2WI, DWI, and the bestvalidation-performing DCE phase.In total, four singlesequence ALNM classifiers and 11 multisequence ALNM classifiers were built.The single-sequence ALNM classifiers were the linear weighted sum of radiomics features, and the multisequence ALNM classifiers were the linear weighted sum of the outputs of single-sequence ALNM classifiers.
The ROC curve, AUC, sensitivity, specificity, accuracy, and precision of the models were analyzed to assess the performance of the ALNM classifier.Figure 2 shows the flowchart for the ALNM classifier development and validation.
To compare the performance of the radiomics model with the diagnoses of the radiologists, two radiologists (a junior radiologist with 5 years of experience and a senior radiologist with 20 years of experience) were assigned to individually read all original images of each case in test set 1 and test set 2 and to make a diagnostic decision (ALNM vs. NALNM).The ALNM criteria were as follows: enlarged lymph nodes (diameter of ≥ 10 mm) with single or multiple axillary hilar disappearance and uneven circular enhancement on contrast-enhanced MRI images, or small lymph nodes (diameter of < 10 mm) with multiple hilar disappearance and uneven circular enhancement.The NALNM criteria do not fulfill any ALNM criteria.

Statistical analysis
The categorical variables of patients with ALNM and NALNM were compared using the chi-square test.The Shapiro-Wilk test was used to analyze the distributional Fig. 2 Workflow for the construction and validation of the ALNM classifier properties of continuous variables as mean ± standard deviation for data with a normal distribution or as median (interquartile range) for data with a non-normal distribution.Continuous variables with or without a normal distribution were compared using the Student's t-test or the Wilcoxon rank-sum test, respectively.The differences between ROC curves were compared using the DeLong test [21].The statistical analysis, feature selection, and development and validation of the model were performed using the R software (version 4.0.3,https:// www.r-project.org/).The two-tailed statistical tests were used, and a P value of < 0.05 was considered statistically significant.

Characteristics of the patients
The age of the patients with NALNM in the training set did not have a normal distribution (P = 0.001).There was no significant differences in terms of age, ER, PR, and HER-2 status, Ki-67 expression, and molecular subtypes between the NALNM and ALNM cohorts in the training set, test set 1, and test set 2. Table 2 depicts the clinical and pathological characteristics of the patients.
Table 4 presents the AUC, accuracy, sensitivity, and specificity of single-sequence and multisequence ALNM classifiers.The single-sequence ALNM classifier derived from DCE postcontrast phase 1 had the best performance for both test set 1 (AUC = 0.891) and test set 2 (AUC = 0.619) among all DCE phases.The best-performing multisequence ALNM classifier for both test set 1 (AUC = 0.910) and test set 2 (AUC = 0.717) was generated from DCE postcontrast phase 1, T2WI, and DWI single-sequence ALNM classifiers.In test set 1, the AUC of the DCE postcontrast phase 1 + T2WI + DWI model was significantly higher than that of the senior radiologist (0.910 vs. 0.641, P = 0.012).In test set 2, the AUC of the DCE postcontrast phase 1 + T2WI + DWI model did not significantly differ from that of the senior radiologist (0.717 vs. 0.650, P = 0.569).Figure 3 shows the ROC curves of the best-performing single-sequence, multisequence ALNM classifiers, and two radiologists.

Discussion
This retrospective study compared the performance of prediction models based on different combinations of multisequence MRI radiomics features for differentiating the ALN status of patients with breast cancer.The models were developed using data from two distinct centers, enhancing their generalizability and robustness, particularly with the inclusion of data from the second center as an external validation set.The model incorporating the radiomics features from DCE postcontrast phase 1, T2WI, and DWI had the best performance in both the internal and external validation sets.Furthermore, the   diagnostic performance between the MRI radiomics model and two radiologists was performed.Results showed that the MRI radiomics model had a better diagnostic efficiency than radiologists.Our findings are in accordance with a subset of prior studies showing that clinicopathologic features, including age, ER, PR, and Her-2 status, and Ki-67 expression, and molecular subtypes were not significantly correlated with ALNM outcomes [20,22,23].Consequently, these clinicopathologic characteristics were excluded from our models.Nonetheless, the prevailing view of researchers is that ER and PR status, ki-67 expression, and molecular types can be predictive factors of ALNM [24][25][26][27][28].The differences may be attributed to the inclusion of various study populations, use of a small sample size, and uneven distribution of the sample size in our study.
The single-sequence ALNM classifier derived from DCE postcontrast phase 1 had the best performance in the test set 1 (AUC = 0.891) and test set 2 (AUC = 0.619).This result is similar to that of Liu et al. [29].This may be because DCE-MRI can diagnose breast diseases by evaluating tumor morphology and hemodynamics.Thus far, breast cancer is a common type of tumor with a rich blood supply, and enhancement is more pronounced in the early stage.Therefore, DCE postcontrast phase 1 is the most effective in displaying the boundaries, heterogeneity, and invasiveness of breast cancer lesions [30].
Numerous studies have investigated the optimal combination of sequences, often selecting T2WI, DWI, and enhancement sequences based on priori experiences [7,8].Dong et al. reported that the AUC of a radiomics model combining T2WI and DWI was 0.805 [7].DWI images can provide additional insights into the diffusion-perfusion characteristics of the primary tumor [31].T2WI is known for its superior tissue contrast, offering textural features that enhance discrimination.In our study, the inclusion of DCE postcontrast phase 1 radiomics and T2WI and DWI radiomics resulted in a higher AUC (0.910 in test set 1).Therefore, the enhancement sequence improves lesion visualization, which is in accordance with clinical observations.Our study showed that the optimal multisequence ALNM classifier outperformed the junior and senior radiologists in terms of AUC scores (0.910 vs. 0.641 vs. 0.548 in test set 1 and 0.717 vs. 0.583 vs. 0.650 in test set 2).This underscores the utility of radiomics models in clinical diagnostics, a conclusion supported by established research [32,33].
The single-sequence ALNM classifier derived from DCE postcontrast phase 1 comprised features from the First_order features and GLSZM measures.This finding is consistent with that of previous radiomics studies [34,35].In the T2WI-and DWI-based model, the wavelet features were the predominant features, which is consistent with the study showing that wavelet features must be the building blocks of radiomics models [36].
Similar to previous studies, our study predicted the status of ALNM based solely on radiomics features extracted from the ROIs of primary tumors.Radiomics models incorporating ROIs from the peritumor region can possibly provide a richer texture information than those based solely on primary tumor ROIs, thereby improving the accuracy and completeness of prediction models [37].For example, Liu et al. established radiomics models using intratumoral, 3-mm peritumoral, and 5-mm peritumoral radiomics features from DCE-MRI to predict ALNM status using various machine learning algorithms [38].Their results indicated that the combined intratumoral and 3-mm peritumoral model, constructed using the BPNN algorithm, exhibited the best predictive performance.Hence, the tumor peripheral microenvironment can play a significant role in predicting tumor aggressiveness.
Recent advancements in abbreviated (AB)-MRI have gained attention due to their potential to reduce MRI costs by shortening image acquisition and interpretation times [39].A meta-analysis assessing the diagnostic accuracy of AB-MRI against full diagnostic protocol MRI (FDP-MRI) in both the screening and enrichment cohorts found no significant differences in terms of sensitivity or specificity between the two methods [40].Several AB-MRI protocols recommend the inclusion of T1-weighted pre-and postcontrast sequences.This study identified an optimal sequence combination of T2WI, DCE postcontrast phase 1, and DWI, which not only supports the practicality of AB-MRI but also offers innovative directions for its protocols.Notably, in the AB-MRI protocol proposed by Kuhl et al. [41], in addition to the conventional sequences including DCE precontrast and postcontrast phase 1, there were also special reconstruction sequences such as subtraction and maximum-intensity projection (MIP).Results showed that the diagnostic performance of the AB-MRI protocol was comparable to that of the full diagnostic protocol.In our retrospective study, due to the limited storage capacity of the PACS system, the subtraction, MIP sequences of each patient could not be obtained in time, and only the normal reconstruction sequences were used.We can add MIP and subtraction sequences purposely in the future and continually search for the combination of MRI sequences with a higher diagnostic efficacy.Moreover, breast MRI is advantageous for evaluating axillary lymphatic conditions.However, mammography and contrast-enhanced spectral mammography do not comprehensively cover axillary lymph nodes.This study introduces a novel approach for predicting ALNM by delineating original lesions on mammography or contrast-enhanced spectral mammography images, a method validated by the findings of Mao et al. [42].

Limitations
This study had some limitations.First, although our findings are promising, the limited number of patients affect the generalizability of our results.Thus, we plan to expand our study by including more patients and collaborating with additional centers.Second, the heterogeneity of images from different machines in multiple centers could not be prevented.Third, radiomics feature extraction required the presegmentation of VOIs, which is still dependent on manual delineation by radiologists.This step is time-consuming, at a risk of error, and has a low reproducibility.If imaging histology modeling is applied in clinical practice, a reliable and efficient automated segmentation method should be identified [43,44].Moreover, our study only included MRI radiomics features and clinicopathologic features.The model did not use ALNs and peritumor radiomics features, which limited the comprehensiveness of the prediction model.Finally, DCE-MRI images for VOIs were used exclusively.DCE images offer valuable temporal information about the "wash-in" and "wash-out" of contrast agents.However, they may not fully leverage the enhanced contrast available in the difference images between postcontrast and precontrast phases.Difference images can provide a stronger contrast, potentially leading to a more precise tumor delineation.

Conclusion
The impact of several MRI sequences on ALNM prediction was examined.This comparative study is beneficial to the community because it can provide a better comprehension of the relative benefits of various MRI sequences on radiomics based ALNM distinction.Nevertheless, future studies should enroll patients from different centers, include a larger sample size, and develop more reliable predictive models with a greater generalization ability.

Fig. 1
Fig. 1 MRI images of a 56-year-old woman with breast cancer.(a) DCE postcontrast phase 2 image.(b) VOI of the primary tumor manually delineated by the radiologist Figure 4 depicts the MRI images of four representative cases.

Fig. 4
Fig. 4 MRI images of four representative cases.(a, e, i, and m) DCE postcontrast phase 1 images of the primary tumor.(b, f, j, and n) T2WI images of the primary tumor.(c, g, k, and o) DWI images of the primary tumor.(d, h, l, and p) DCE postcontrast phase 1 images of ALN.(a-d) A 57-year-old female patient with breast cancer presented with pathologically confirmed left ALNM.The patient was misdiagnosed with NALNM by the junior radiologist but was correctly diagnosed by the senior radiologist and the combined DCE postcontrast phase I + T2WI + DWI model.(e-h) A 66-year-old female patient with breast cancer presented with pathologically confirmed right NALNM.The patient was misdiagnosed with ALNM by the senior radiologist but was correctly diagnosed by the junior radiologist and the combined DCE postcontrast phase I + T2WI + DWI model.(i-l) A 65-year-old female patient with breast cancer presented with pathologically confirmed right NALNM.The patient was misdiagnosed with ALNM by the senior and junior radiologists but was correctly diagnosed by the combined DCE postcontrast phase I + T2WI + DWI model.(m-p) A 59-year-old female patient with breast cancer who presented with pathologically confirmed left ALNM.The patient was correctly diagnosed by the senior and junior radiologists and the combined DCE postcontrast phase I + T2WI + DWI model

Table 1
MRI protocols of two scanners TR: repetition time, TE: echo time, FOV: field of view

Table 2
Patient profiles in the training set and test set

Table 3
Radiomics features and weights in single-sequence ALNM classifiers

Table 4
Performance of the ALNM models and radiologists